PREMIER - PRobabilistic error-correction using Markov inference in errored reads

نویسندگان

  • Xin Yin
  • Zhao Song
  • Karin S. Dorman
  • Aditya Ramamoorthy
چکیده

THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. In this work we present a flexible, probabilistic and reference-free method of error correction for high throughput DNA sequencing data. The key is to exploit the high coverage of sequencing data and model short sequence outputs as independent realizations of a Hidden Markov Model (HMM). We pose the problem of error correction of reads as one of maximum likelihood sequence detection over this HMM. While time and memory considerations rule out an implementation of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal Viterbi algorithm (for error correction), we propose low-complexity approximate versions of both. Specifically, we propose an approximate Viterbi and a sequential decoding based algorithm for the error correction. Our results show that when compared with Reptile, a state-of-the-art error correction method, our methods consistently achieve superior performances on both simulated and real data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic insertion, deletion and substitution error correction using Markov inference in next generation sequencing reads

Error correction of noisy reads obtained from high-throughput DNA sequencers is an important problem since read quality significantly affects downstream analyses such as detection of genetic variation and the complexity and success of sequence assembly. Most of the current error correction algorithms are only capable of recovering substitution errors. In this work, Pindel, an algorithm that sim...

متن کامل

Introduction to Probabilistic Graphical Models

Over the last decades, probabilistic graphical models have become the method of choice for representing uncertainty in machine learning. They are used in many research areas such as computer vision, speech processing, time-series and sequential data modelling, cognitive science, bioinformatics, probabilistic robotics, signal processing, communications and error-correcting coding theory, and in ...

متن کامل

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data

Advances in genomics allow researchers to measure the complete set of transcripts in cells. These transcripts include mRNAs (which encode for proteins) and microRNAs, short RNAs that play a regulatory role in cellular networks. While this data is a great resource for reconstructing the activity of networks in the cell, it also presents several challenges. These challenges begin with the data co...

متن کامل

SlimShot: In-Database Probabilistic Inference for Knowledge Bases

Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today’s state of the art systems use vari...

متن کامل

SlimShot: Probabilistic Inference for Web-Scale Knowledge Bases

Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today’s state of the art systems use vari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013